De novo assembly and validation of planaria transcriptome by massive parallel sequencing and shotgun proteomics.

نویسندگان

  • Catherine Adamidi
  • Yongbo Wang
  • Dominic Gruen
  • Guido Mastrobuoni
  • Xintian You
  • Dominic Tolle
  • Matthias Dodt
  • Sebastian D Mackowiak
  • Andreas Gogol-Doering
  • Pinar Oenal
  • Agnieszka Rybak
  • Eric Ross
  • Alejandro Sánchez Alvarado
  • Stefan Kempa
  • Christoph Dieterich
  • Nikolaus Rajewsky
  • Wei Chen
چکیده

Freshwater planaria are a very attractive model system for stem cell biology, tissue homeostasis, and regeneration. The genome of the planarian Schmidtea mediterranea has recently been sequenced and is estimated to contain >20,000 protein-encoding genes. However, the characterization of its transcriptome is far from complete. Furthermore, not a single proteome of the entire phylum has been assayed on a genome-wide level. We devised an efficient sequencing strategy that allowed us to de novo assemble a major fraction of the S. mediterranea transcriptome. We then used independent assays and massive shotgun proteomics to validate the authenticity of transcripts. In total, our de novo assembly yielded 18,619 candidate transcripts with a mean length of 1118 nt after filtering. A total of 17,564 candidate transcripts could be mapped to 15,284 distinct loci on the current genome reference sequence. RACE confirmed complete or almost complete 5' and 3' ends for 22/24 transcripts. The frequencies of frame shifts, fusion, and fission events in the assembled transcripts were computationally estimated to be 4.2%-13%, 0%-3.7%, and 2.6%, respectively. Our shotgun proteomics produced 16,135 distinct peptides that validated 4200 transcripts (FDR ≤1%). The catalog of transcripts assembled in this study, together with the identified peptides, dramatically expands and refines planarian gene annotation, demonstrated by validation of several previously unknown transcripts with stem cell-dependent expression patterns. In addition, our robust transcriptome characterization pipeline could be applied to other organisms without genome assembly. All of our data, including homology annotation, are freely available at SmedGD, the S. mediterranea genome database.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

Shotgun Protein Sequencing ASSEMBLY OF PEPTIDE TANDEM MASS SPECTRA FROM MIXTURES OF MODIFIED PROTEINS*□S

Despite significant advances in the identification of known proteins, the analysis of unknown proteins by MS/MS still remains a challenging open problem. Although Klaus Biemann recognized the potential of MS/MS for sequencing of unknown proteins in the 1980s, low throughput Edman degradation followed by cloning still remains the main method to sequence unknown proteins. The automated interpreta...

متن کامل

De novo detection of A-to-I RNA editing sites in human mRNAs by massive transcriptome sequencing

Motivations RNA editing is a widespread molecular phenomenon which modifies primary transcripts at specific positions [1]. It occurs in a variety of organisms including human and cooperates with alternative splicing in increasing both proteomic and transcriptomic complexity. RNA Editing can modulate gene expression and affect protein functionality. In human, such phenomenon is highly frequent i...

متن کامل

A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data

Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified single-cell genomes, and metagenomes has enabled investigation of a wide range of organisms and ecosystems. However, sampling variation in short-read data sets and high sequencing error rates of modern sequencers present many new computational challenges in data interpretation. These challenges have led to the developmen...

متن کامل

Shotgun Protein Sequencing: Assembly of Tandem Mass Spectra from Mixtures of Modified Proteins

Despite significant advances in the identification of known proteins, the analysis of unknown proteins by tandem mass spectrometry (MS/MS) still remains a challenging open problem. Although Klaus Biemann recognized the potential of tandem mass spectrometry (MS/MS) for sequencing of unknown proteins in the 1980s, low-throughput Edman degradation followed by cloning still remains the main method ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genome research

دوره 21 7  شماره 

صفحات  -

تاریخ انتشار 2011